🔍 Tool Execution Analysis Report

Comprehensive analysis of tool performance and execution patterns
Generated on September 29, 2025 at 08:44 AM
Source: baseline_airline_xai_grok3_gemini2_5_flash.json

📊 Executive Summary

200
Total Simulations
1162
Total Tool Calls
0.27ms
Avg Execution Time
13
Unique Tools

💡 Key Insights

🎯 Performance Insights

  • 4 out of 13 tools have excellent performance (≥95% success rate)
  • get_reservation_details is the most frequently used tool with 488 calls
  • Overall system reliability: 65.3%

🔄 State Management Insights

  • 6 tools perform state changes, 7 are read-only
  • State-changing operations: 139 calls
  • Read-only operations: 1023 calls

⚠️ Error Analysis

  • 204 total errors across 1 error types
  • Most problematic tool: search_direct_flight (63 errors)
  • Primary error type: ActionCheckFailure

🛠️ Tool Performance Analysis

Tool Name Total Calls Success Rate Avg Time (ms) Performance State Changes
get_reservation_details 488 44.5% 0.10ms Poor 0/488
search_direct_flight 164 10.4% 0.24ms Poor 0/164
get_user_details 158 35.4% 0.82ms Poor 0/158
search_onestop_flight 100 100.0% 0.68ms Excellent 0/100
update_reservation_flights 62 61.3% 0.13ms Poor 62/62
transfer_to_human_agents 56 0.0% 0.06ms Poor 0/56
get_flight_status 56 100.0% 0.06ms Excellent 0/56
cancel_reservation 39 74.4% 0.15ms Poor 39/39
update_reservation_baggages 16 56.2% 0.08ms Poor 16/16
update_reservation_passengers 9 100.0% 0.12ms Excellent 9/9
book_reservation 9 55.6% 0.22ms Poor 9/9
send_certificate 4 100.0% 0.16ms Excellent 4/4
calculate 1 0.0% 0.11ms Poor 0/1

🔄 State Change Analysis

Tool Name Category Calls Success Rate Avg Time (ms) Performance Rating
update_reservation_flights State-Changing 62 79.0% 0.13ms Fair
cancel_reservation State-Changing 39 100.0% 0.15ms Excellent
update_reservation_baggages State-Changing 16 100.0% 0.08ms Excellent
book_reservation State-Changing 9 88.9% 0.22ms Fair
update_reservation_passengers State-Changing 9 100.0% 0.12ms Excellent
send_certificate State-Changing 4 100.0% 0.16ms Excellent
get_reservation_details Read-Only 488 100.0% 0.10ms Excellent
search_direct_flight Read-Only 164 100.0% 0.24ms Excellent
get_user_details Read-Only 158 100.0% 0.82ms Excellent
search_onestop_flight Read-Only 100 100.0% 0.68ms Excellent
get_flight_status Read-Only 56 100.0% 0.06ms Excellent
transfer_to_human_agents Read-Only 56 100.0% 0.06ms Excellent
calculate Read-Only 1 100.0% 0.11ms Excellent

🔥 Failure Analysis

🎯 Root Cause Analysis

Total Failures

204

Error Rate

17.6%

Affected Tools

10

Error Categories

1

🚨 Primary Failure Modes

Action Check Failures

10 tools failed action validation checks:

  • search_direct_flight: 63 failures (78.8% rate)
    → Affected 19 simulation(s)
    → Example args: {'origin': 'BOS', 'destination': 'MCO', 'date': '2024-05-18'}
  • update_reservation_flights: 46 failures (54.8% rate)
    → Affected 29 simulation(s)
    → Example args: {'reservation_id': 'XEHM4B', 'cabin': 'economy', 'flights': [{'flight_number': 'HAT005', 'date': '20...
  • book_reservation: 28 failures (84.8% rate)
    → Affected 22 simulation(s)
    → Example args: {'user_id': 'mohamed_silva_9265', 'origin': 'JFK', 'destination': 'SFO', 'flight_type': 'round_trip'...
  • cancel_reservation: 22 failures (43.1% rate)
    → Affected 15 simulation(s)
    → Example args: {'reservation_id': 'XEHM4B'}
  • update_reservation_baggages: 15 failures (62.5% rate)
    → Affected 15 simulation(s)
    → Example args: {'reservation_id': 'FQ8APE', 'total_baggages': 3, 'nonfree_baggages': 0, 'payment_id': 'gift_card_81...
  • get_reservation_details: 11 failures (4.8% rate)
    → Affected 5 simulation(s)
    → Example args: {'reservation_id': 'SDZQKO'}
  • send_certificate: 8 failures (66.7% rate)
    → Affected 8 simulation(s)
    → Example args: {'user_id': 'noah_muller_9847', 'amount': 50}
  • calculate: 4 failures (100.0% rate)
    → Affected 4 simulation(s)
    → Example args: {'expression': '2 * ((350 - 122) + (499 - 127))'}
  • transfer_to_human_agents: 4 failures (100.0% rate)
    → Affected 4 simulation(s)
    → Example args: {'summary': 'User wants to change my upcoming one stop flight from ATL to LAX within reservation XEW...
  • update_reservation_passengers: 3 failures (25.0% rate)
    → Affected 3 simulation(s)
    → Example args: {'reservation_id': '3RK2T9', 'passengers': [{'first_name': 'Anya', 'last_name': 'Garcia', 'dob': '19...

⚡ Performance Impact Analysis

High-Usage Tools with Poor Performance
Tool Name Total Calls Success Rate Avg Time (ms)
get_reservation_details 488 44.5% 0.10ms
search_direct_flight 164 10.4% 0.24ms
get_user_details 158 35.4% 0.82ms
update_reservation_flights 62 61.3% 0.13ms
transfer_to_human_agents 56 0.0% 0.06ms
cancel_reservation 39 74.4% 0.15ms
update_reservation_baggages 16 56.2% 0.08ms
book_reservation 9 55.6% 0.22ms
Slowest Tools by Execution Time
Tool Name Avg Time (ms) Total Calls Success Rate
get_user_details 0.82ms 158 35.4%
search_onestop_flight 0.68ms 100 100.0%
search_direct_flight 0.24ms 164 10.4%
book_reservation 0.22ms 9 55.6%
send_certificate 0.16ms 4 100.0%

💡 Key Insights

  • Most problematic tool: search_direct_flight (63 failures)
  • Primary failure mode: Action validation failures suggest issues with tool argument validation or execution logic
  • Average tool success rate: 56.7%
  • ⚠️ Low overall success rate suggests systemic issues requiring investigation

🔧 Critical Recommendations

  1. Action Validation: Review and strengthen argument validation logic for failing tools
  2. Error Handling: Implement more robust error recovery mechanisms
  3. Performance Optimization: Focus on improving poor-performing tools with high usage
  4. Monitoring: Implement enhanced monitoring and alerting for tools with high failure rates
  5. Testing: Increase test coverage for identified problematic tool patterns

🔗 Tool Flow Analysis

Tool Sequence Patterns

Most common tool transitions:

  • get_reservation_detailsget_reservation_details (287 times)
  • get_user_detailsget_reservation_details (134 times)
  • search_direct_flightsearch_onestop_flight (68 times)
  • search_direct_flightsearch_direct_flight (66 times)
  • search_onestop_flightsearch_direct_flight (45 times)

Recursive patterns: 7 tools frequently call themselves, indicating iterative processing patterns.

📋 Recommendations

🚨 High Priority Actions

  • Critical: System success rate is only 65.3%. Immediate investigation required.
  • High failure rate: 17.6% of calls are failing across the system.

⚡ Performance Optimizations

  • Poor performing tools: 9 tools identified: get_reservation_details (2.3% failure), search_direct_flight (38.4% failure), get_user_details (0.0% failure)
  • High usage pattern: Top usage tools account for 3 of 13 total tools: get_reservation_details, search_direct_flight, get_user_details
  • State operation pattern: State-changing operations have 5.3% error rate vs 0.0% for read-only operations.

📈 Enhancement Opportunities

  • System scope: Analysis covers 13 different tools across 1162 total calls.